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^O . Abstract. Reliable calculations of the structure of heavy elements are crucial to address 

fundamental science questions such as the origin of the elements in the universe. Applications 
relevant for energy production, medicine, or national security also rely on theoretical predictions 
of basic properties of atomic nuclei. Heavy elements are best described within the nuclear density 

["**■ ' functional theory (DFT) and its various extensions. While relatively mature, DFT has never 

>— ^ , been implemented in its full power, as it relies on a very large number (~ 10® — lO'^^) of expensive 

calculations (~ day). The advent of leadership-class computers, as well as dedicated large-scale 
collaborative efforts such as the SciDAC 2 UNEDF project, have dramatically changed the 
field. This article gives an overview of the various computational challenges related to the 

1 J ■ nuclear DFT, as well as some of the recent achievements. 

1. Introduction 

Nuclear physics pervades a number of scientific disciplines as well as societal applications. 
Understanding the production of the elements in stellar interiors is key to reproduce the observed 
isotopic abundances in the universe. It necessitates an accurate knowledge of the structure of 
radioactive nuclides that are so short-lived that they will remain beyond reach of experimental 
facilities for many years to come, if not for ever. In the long term, the safety and viability of 
nuclear energy production sources will be enhanced by acquiring a precise understanding of the 
complex mechanisms of fission and fusion. The latter are also critical for stockpile stewardship, 
which has broad implications for national security. 

At the heart of these important issues lies the elusive structure of the atomic nucleus. 
From a physics standpoint, it is a quantum many-body problem facing three major theoretical 
challenges: (1) the interaction that binds neutrons and protons together in the nucleus is in 
principle derived from quantum chromodynamics (QCD), but this derivation has not been firmly 
established yet; (2) the interaction between nucleons inside the nucleus is very different from 
the one between isolated nucleons in the vacuum (in- medium interactions); (3) the number 



of constituents in the nucleus (~l-300) almost always forbids both exact analytical solutions, 
except for the lightest systems, as well as the use of statistical methods applicable to systems 
with a very large number of particles. In spite of these formidable difficulties, there has been 
significant progress over the past 50 years to address all these issues. 

While there exist many excellent models for light nuclei, heavy elements can be described only 
by what is variously known as the nuclear self-consistent mean-field theory or, more recently, 
density functional theory (DFT) [Ij. Since its inception in the 1950ies, DFT has reached a 
satisfactory level of maturity. Until now, however, computational limitations did not allow to 
implement the theory as originally designed, resulting in uncontrolled systematic errors, poor 
precision, and dubious reliability in regions of exotic nuclei where experimental information is 
scarce or nonexistent. The fast development of leadership-class computers has for the first time 
lifted many of these limitations, and the solution to long-standing problems seems now possible 
in the short term. After briefly introducing the underlying theoretical background, this article 
discusses some of the computational challenges and methods used in nuclear DFT, highlights 
some of the recent achievements, and discusses current open problems. 

2. Theoretical Models of Heavy Nuclei 

The central hypothesis of nuclear DFT is that the A nucleons (protons and neutrons) inside the 
nucleus can be treated as independent quasi-particles moving in an average nuclear potential 
well. The theory can be entirely formulated by introducing the so-called one-body density matrix 
p{x,x') and pairing tensor k{x,x'), where x = {r,a) includes spatial as well as spin coordinates, 
a = ±1/2. Requiring that the total energy E of the nucleus is minimal under a variation of 
both p and k leads to the so-called Hartree-Fock-Bogoliubov (HFB) equations: 

, / h[pix,x')] - A A[k{x,x')] \ ( U,,{x') \_^( U^{x) \ ,, _ . , ^ 

"^"^ V -A*Kx,x')] -h*[p{x,x')] + \ ) \ V,{x') )~^^\ y^(x) )' ^-^'•••'+°0' 

(1) 
with A a Lagrange parameter that must be introduced to conserve particle number, and 

p(x,x') = ^F;(x)y^(x'), 

(2) 
k{x,x') = Y,V;{x)U^{x'). 

In Eq. ([T]), h[p\ is a Hermitian operator (mean-field), which is a functional of the density matrix, 
and A[k] is an antisymmetric operator (pairing field), which is a functional of the pairing tensor. 
The explicit dependence of the HFB matrix on the eigenfunctions (C/^, V^) via the density matrix 
and pairing tensor, or self-consistency, makes the eigenvalue problem highly nonlinear. 
In its most general form, the mean-field operator reads: 

h = -^V^ + T{x,x') (3) 

with h the Planck constant, m the mass of a nucleon, V the gradient operator, and F the so- 
called Hartree-Fock potential. In phenomenological mean-field models, F is in fact parametrized 
by some suitable mathematical function and does not depend on the density matrix. In the 
traditional version of the self-consistent mean-field theory, F is instead computed from a local 
two-body interaction, or effective pseudo-potential, V{x,x'), which depends on the spatial and 
spin coordinates x and x' of two nucleons and takes the general form 

T{x,x') = 5{x — x') I dxiV{x,xi)p{xi,xi) — V{x,x')p{x,x'). (4) 



Standard two-body interactions have either zero-range, that is, V{x, x') ~ 5{r — r') (Skyrme 
forces), or finite-range, V{x,x') ~ V{r — r') (Gogny forces), which may further simphfy the 
general expression @ . They ah contain a density-dependent term that is necessary to reproduce 
the saturation of nuclear matter but that makes them ill-behaved for extensions of the theory 
dealing with large amplitude collective motion [21 [3] . Recent formulations of nuclear DFT do not 
consider explicitly effective interactions and instead parametrize T[x^x') directly as a functional 
of the density matrix p(a;,x'), or alternatively the local density p{x) and its spatial derivatives. 
The success of the nuclear mean-field theory relies on the mechanism of spontaneous 
symmetry breaking, whereby the solutions of the HFB equations ([1]) may break some of the 
symmetries of the underlying effective interaction V{x,x'). This mechanism can be viewed as 
a way to introduce correlations in what is otherwise an independent particle model. A simple 
example is the breaking of rotational invariance of V{x,x'), which implies that the density 
matrix and pairing tensor can have nonisotropic spatial distributions: in the mean-field theory, 
nuclei can be deformed, and the energy of the nucleus therefore depends on the deformation. 
However, this dependence is not known beforehand. In practice, one must introduce constraint 
operators Q«m to probe the deformation energy surface. The problem is complicated by the fact 
that the expectation value {Qim) of the (local) constraint operator Qim is itself a functional of 
the density matrix, 

(Qlm) = / dxQlm{x)p{x,x). (5) 



The HFB equations ([T]) with the constraints ([5]) represent a system of coupled, nonlinear, 
integro-differential equations and are the cornerstone of the description of heavy elements in 
a microscopic framework. In the following, we discuss the various methods used to solve these 
equations, as well as the related mathematical and computational challenges. 

3. Mathematical and Computational Challenges 

From a computational perspective, nuclear DFT has two facets: (i) the HFB solver itself and 
(ii) the management of a large number of quasi- independent, time-consuming, load-imbalanced 
tasks. We discuss below each of these aspects. 

3.1. HFB Solver 

Solving the HFB Equations - There exist essentially two classes of methods to solve 
the HFB equations. In the coordinate representation, the equations ([T]) are solved directly by 
numerical integration for each eigenfunction p. Boundary conditions for the wave- functions 
are imposed on the domain of integration. While precise, the feasibility and usability of 
this approach are highly dependent on the symmetries of the wave functions: in spherical 
symmetry, the eigenfunctions are separable (?7^(r), l/^(r)) = {u^{r),v^{r))Yim{9,f)- The HFB 
equations depend only on the radial coordinate r, and numerical implementations can be very 
fast (typically less than 1 minute per HFB calculation) [3]. In cylindrical symmetry, the wave- 
functions depend explicitly on the two variables z and p, and special separation techniques 
(B-splines or similar) must be employed. Codes with built-in parallel capabilities achieve good 
convergence for a few dozens of cores/HFB calculation in a few hours [5]. Full 3D solvers in 
coordinate space are in development: they will probably require a large number of cores (>1,000) 
to achieve convergence in less than a day. At this scale, the benefits of using DFT (reformulate 
the problem to replace the 3A coordinates of the nucleons by the only three coordinates of the 
density matrix) are greatly reduced though. 

The alternative approach consists of introducing a basis of the Hilbert space C2 of square- 
integrable functions and computing the matrix elements of all relevant operators in that basis. 
The Harmonic Oscillator (HO) basis proves the most adapted to nuclear structure applications 
as it involves very localized basis functions. By comparison, molecular physics applications 



often employ the plane wave basis. The choice of the coordinate system is dictated by the 
symmetries that one wants to impose, or relax, on the system. For example, fission studies 
typically require many spatial symmetries to be broken, and the Cartesian HO basis becomes 
the tool of choice. The HO basis is by definition an infinite countable basis of the one-particle 
Hilbert space: numerical implementations require the truncation of the expansion up to a 
maximum number of oscillator shells A^. This introduces systematic truncation errors, as well as 
an artificial dependence on the frequency loq of the HO. When studying very deformed nuclei, it 
is recommended to choose an anisotropic 2D HO, with uj± ^ wii. The final truncation error then 
depends also on the deformation of the basis, that is, the ratio q = uju/uj±. For any practical 
calculation, the final HFB energy E should therefore be written E{N,iOQ,q). High-precision 
calculations require systematic and costly studies of this model space dependence; see Figs. 
[TH2] [6] . Recent attempts have therefore been made to apply multiresolution methods based on 
wavelet expansions to nuclear DFT, in order to combine the versatility of basis expansion with 
arbitrary precision results [7]. 
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Figure 1. (color online) Conver- 
gence of the energy for various de- 
formed states in ^^'^Pu as function of 
the number of HO shells. The dashed 
line gives the result of [5j. 



Figure 2. (color online) Convergence of the energy 
for the ground-state of 208p|-, g^g function of the HO 
oscillator length b = y^mujQ/h for various number of 
oscillator shells. 



Dense Linear Algebra - When expressed in a single-particle basis, the density p{x, x') 
becomes an actual matrix pij defined as 



pij 



// dxdx'(f)*{x')p{x,x')(f>i{x). 






(6) 



A general two-body interaction becomes a rank-4 tensor Vabcd- The HF potential is obtained by 
taking a tensor contraction: 

^ac = / ^VgbcdPdb, (7) 

bd 

where the summation for each index extends over the size of the basis. Such tensor contractions 
must be performed at every iteration and represent an important bottleneck in the calculation. 
Indeed, for a heavy nucleus, the size of the HO basis for a precise calculation contains typically 



A^ > 20 shells. In a Cartesian basis, this implies that each index a, b, c, d is in fact a set of 
three numbers, a ^- Ua = {ux^a, ny^a, ^z^a) with Ux^a + %,a + f^z.a = A^. A naive implementation 
of Eq. ([7]) would therefore require a 12-nested loop to compute the entire matrix Tac- Most 
problematic, in double-precision arithmetic the size of the complex tensor Vabcd would be on 
the order of 80 TB after taking into account the anti-symmetry properties of Vabcd- Current 
implementations therefore do not store the matrix Vabcd but compute it on the fly, adding to the 
computational overhead, and store the matrix Tac instead (~ 50 MB storage). 

An alternative technique to compute Tac, which avoids handling directly the tensor Vabcd and 
is particularly efficient when the functional depends only on the local density matrix, relies on 
the fact that the functional dependence of T on the density matrix p is known. The HF potential 
for a local functional is simplified: T[p{x,x')] — ?> r[p(2;)]. The matrix Tac can then be computed 
by only one 3D integral: 



dx(j)*{x)T[p{x)](j)i{x). 



(8) 



Such integrations can be performed exactly by quadrature formulas. The only time-consuming 
part of this method is to obtain an expression of p{x) on the quadrature mesh, that is, 
Pi^k^ , Uky ,ZkJ- One must compute 



p{Xk,,yky,ZkJ = '^'^V^^Vr,f,ct>l,{xk^,yky,ZkJ(t)m{Xk^,yky,Zk, 



(9) 



mn /i 



This dense linear algebra requires at first sight 10-nested loops to build the entire representation 
of p on the integration grid. Various numerical tricks can be used to reduce this number [8]. 
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Figure 3. (color online) Convergence of 
the energy of two different nuclei with and 
without constraint on the quadrupole moment 
using standard linear mixing and the modified 
Broyden method [9]. 



Figure 4. (color online) Small-scale illustra- 
tion of convergence failure. The figure shows a 
2D cross-section of the potential energy land- 
scape in ^^''Pu. Black points are calculated 
points; missing points did not converge. 



Iterative Algorithms - The HFB equations are usually solved by iterations: starting 
at iteration with an initial guess for, for example, the density matrix p\J and pairing tensor 
, one constructs the HFB matrix M^^'; diagonalizing it gives the eigenvectors at iteration 



K; 



(0) 



(0) 



0, which are used to compute an updated version of the density matrix /Oq^^ and pairing tensor 
Kq^^; using these updates as input to iteration 1, one constructs the new HFB matrix M^^' , 



diagonalizes it, and so forth, until convergence is met. Formally, this can be written as 



V 



(m) 
out 



liV 



(m)x 
in / 



(10) 



The solution to the HFB equation satisfies: V = I{V), or equivalently F{V) = V — I{V) = 0. 
This is a form of the fixed-point problem. Most DFT solvers iterate V either with a standard 
linear mixing, 



V\ 



(m+l) 



aV^::^ + (1 - a)V 



(m) 



(11) 



or a more elaborate mixing like the modified Broyden mixing [H]. The final number of iterations 
needed to reach convergence is extremely dependent on the type of calculation: ground- 
state properties of a spherical nucleus may take as little as 30 iterations, while the scission 
configuration in ^^^Pu may take as much as 5,000 iterations [TO]. In addition, the iterative 
method often fails to converge, especially with large "exotic" constraints. Since the time of 
calculation is ultimately linearly proportional to the number of iterations, controlling the latter 
and ensuring a high convergence rate is critical for DFT applications. 
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Figure 5. (color online) Hybrid MPI/OpenMP programming model for large-scale DFT 
applications: the process grid is decomposed in MPI communicators made of a few nodes (1 - 
16), handling a given HFB calculation, with OpenMP threading within nodes. 



3.2. Large-Scale Applications 

In itself, one HFB calculation is almost always manageable on a standard computer. However, 
realistic applications always require the computation of a very large number of different 
configurations. The static description of nuclear fission is a good example: at least four 
deformation degrees of freedom are necessary — elongation, triaxiality, mirror symmetry, and 
neck size — ^just for calculations at zero temperature and zero angular momentum. A typical 
estimate for the number of points for each degree of freedom is 500 x 40 x 20 x 20 = 8.10^ points. 
At each point, one can estimate the error due to the truncation of the basis by repeating the 
same calculation with several different combinations of basis parameters: assuming 10 points 
per basis parameter, this adds a factor of 1,000. The typical size of the problem is therefore on 



the order of 10^ 10^^ independent calculations, each taking on the order of a few days on a 

single core for high-precision results. This estimate applies to a single nucleus only. 

Modern DFT solvers have therefore adopted a hybrid MPI/OpenMP programming model, 
illustrated in Fig. [5j Since a large number of HFB calculations is needed for any realistic 
problem, the process grid is decomposed into many small MPI communicators, each in charge 
of handling one HFB task, and possibly spanning multiple nodes. In order to accelerate dense 
linear algebra operations, OpenMP threading is used within a node. In many applications, 
only one MPI task per node is devoted to an HFB calculation. With existing solvers, the 
number of files needed to dump the output of the calculation grows as the number of HFB 
configurations handled: navigating this large mass of data and extracting the most relevant 
information pertaining to the problem at hand can be tricky. Some effort has therefore been 
put into the development of interfaces with advanced data mining software [11] . 



4. Recent Achievements 

Collaborative efforts such as the SciDAC 2 UNEDF project have enabled ground-breaking 
optimizations of nuclear DFT solvers [12], which in turn have led to important discoveries 
in several areas of the physics of heavy nuclei. We highlight below two examples of recent work. 
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Figure 6. (color online) Differences be- 
tween theoretical and experimental atomic 
masses for SLy4 (gray) and UNEDF 1 (red) 
parametrization. Each line corresponds to an 
isotopic sequence. Adapted from [13] . 



Figure 7. (color online) Sensitivity of 
the parameters of the Skyrme functional to 
various types of experimental data included in 
the optimization. Adapted from ^3j. 



Optimization of Energy Functionals - The only input to nuclear DFT is the dozen 
or so low-energy constants characterizing the energy functionals. These parameters need to be 
carefully adjusted to experimental data. In the past, this procedure was usually carried out 
for specific systems such as infinite nuclear matter or doubly magic spherical nuclei, essentially 
because calculations are fast for those cases. However, most realistic nuclei are significantly 
different from such idealized systems, and it has been realized that many energy functionals 
suffer from systematic biases. The availability of heavily optimized DFT solvers together with 
leadership-class computers has allowed parameter optimization to be performed in realistic 
nuclei, that is, deformed nuclei with pairing. Moreover, statistical methods can now be applied 
at the solution to investigate the sensitivity of the solution to the experimental data, as well as 



built-in correlations between the parameters. Such modern methods have shed new light on the 
validity of current functionals and are now being applied to new generations of functionals |13j . 
Description of the Fission Process - The successful description of the fission process 
in the framework of DFT is a poster-child example of a large-scale computational problem 
involving nuclear DFT that could have tremendous applications for society. Most of the recent 
progress in the field has come from the computational side. In particular, the first systematic 
self-consistent survey of fission pathways with several shape degrees of freedom has been carried 
out in the region of the heaviest elements. Calculated lifetimes are in reasonable agreement with 
experiment; see Fig. [8] [H]. In parallel, preliminary studies of compound nucleus fission have 
shed new light on the probability of formation for superheavy elements in fusion reactions; see 

Fig. El [mile]. 
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Figure 8. (color online) Theoretical and 
experimental fission half-lives of even-even 
Fermium isotopes. 
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Figure 9. (color online) Symmetric isentropic 
fission pathways of ^^^Fm (energy normalized 
to the ground-state) including triaxial degrees 
of freedom (dashed lines). 



5. Moving For^vard 

Computing Excited States - Most of the examples presented in this article correspond 
to the ground-state properties of nuclei. A significant challenge to DFT is its ability to also 
describe excited states. In the version of DFT that derives from a two-body effective interaction, 
indications are that a three-body force will have to be explicitly included. Doing so would enable 
to apply a well-established set of techniques such as projection and the generator coordinate 
method that can provide excited spectra. However, these methods will require yet another 
leap in the number of HFB points to be computed: for example, tensor contractions with a 
three-body force will increase from 12 to 18 the number of nested loops needed to compute the 
mean-field Tac in Cartesian coordinates. 

Advanced Data Management - Currently, all DFT solvers have a rather simple I/O 
system, which essentially relies on native Fortran or C/C++ routines for disc access. Files are 
written on a per core basis: in large-scale applications, the number of files becomes huge and its 
management rather complex, and the scalability may degrade quickly as one hits the limits of 
the operating system; see Fig. [TOl It seems therefore necessary to invest into more efficient I/O 
systems, possibly interfaced with professional database management and data-mining software. 
Indeed, a specific feature of DFT is that it produces a lot of data points that need to be analyzed 
in many different ways. 



Real-time Simulation Steering - Current large-scale simulations are intrinsically static: 
given a set of input data shared among processes, each group of cores performs its task 
until completion. However, entire regions of potential energy surfaces irrelevant for physics 
applications cannot be detected until postanalysis is performed; calculations that failed could 
be converged with, for example, slightly different mixing parameters; model space dependence 
could be efficiently studied by optimizaton over the basis parameters, rather than meshing the 
parameter space. All these observations point to the need of dynamically steering the simulation 
based on a set of preliminary results. However, such a program can be viable only if the time 
of one HFB calculation can be reduced to at least less than 1 hour. 
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Figure 10. (color online) Scaling of 
the DFT solver HFODD for identical 
calculations run in parallel. Most of the 
load- imbalance and imperfect scaling comes 
from I/O. From [12]. 



Figure 11. (color online) Time of 

calculation for six HFB iterations depending 
on the ScaLAPACK diagonalization routine 
and the process grid (matrices of rank 2760) . 



Parallel Dense Linear Algebra Libraries - The reduction of the typical computation 
time below the 1-hour barrier is not possible without the development of highly optimized parallel 
dense linear algebra libraries. The current ScaLAPACK library provides a good starting point, 
but many routines do not reach the same level of performance as the original LAPACK versions. 
Currently, the gain is probably not sufficient for many practical application; see Fig. [TTJ 



6. Conclusions 

Nuclear DFT is the only theoretical framework that can be applied to all nuclei from the lightest 
to the heaviest, including stellar environments. While relatively mature, the theory has only 
recently started to benefit from the availability of leadership-class computers and advances in 
code development. Significant progress has been achieved, in particular in terms of parameter 
optimization and specific applications such as fission or large-scale surveys [T7]. It is reasonable 
to anticipate that numerical uncertainties due to the truncation of the model space (HO basis) 
and the collective space (number of deformations) could be virtually eliminated in the near 
future, which would then open the door to high-precision nuclear simulations of importance to 
science and society. Such a promise, though, can be delivered only by the joint effort of both 
nuclear scientists and computer scientists. On the computational side, some of the key aspects 
are the parallelization of dense linear algebra operations, the development of real-time simulation 
steering tools, and the implementation of scalable I/O models and data management tools. 
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